Skip to content

Support tiered data storage#692

Open
enigbe wants to merge 12 commits intolightningdevkit:mainfrom
enigbe:2025-10-tiered-data-storage
Open

Support tiered data storage#692
enigbe wants to merge 12 commits intolightningdevkit:mainfrom
enigbe:2025-10-tiered-data-storage

Conversation

@enigbe
Copy link
Copy Markdown
Contributor

@enigbe enigbe commented Nov 1, 2025

What this PR does

In this PR we introduce TierStore, a three-tiered (KVStore + KVStoreSync) implementation that manages data across three distinct storage layers based on criticality.

Background

As we have moved towards supporting remote storage with VssStore, we need to recognize that not all data has the same storage requirements. Currently, all data goes to a single store which creates some problems:

  • Remote storage adds latency on every read/write operation
  • Network graph and scorer data is ephemeral and easily rebuild-able from the network
  • Persisting ephemeral data to remote storage wastes bandwidth and adds unnecessary latency
  • Users persisting to remote storage have no local disaster recovery option

This PR proposes tiered storage that provides granular control over where different data types are stored. The tiers include:

  • Ephemeral store (network graph, scorer): optional fast local storage to avoid remote latency
  • Primary store (channels, payments, wallet): preferably remote storage for critical data
  • Backup store: optional (preferably local) backup for disaster recovery, with data written to (removed from) lazily to avoid blocking primary data operations

Additionally, we also permit the configuration of Node with tiered storage allowing callers to:

  • set retry parameters,
  • set backup and ephemeral stores, and to
  • build the Node with a primary store.
    These configuration options also extend to our foreign interface, allowing bindings target to build the Node with their own (KVStore + KVStoreSync) implementations. A sample Python implementation is provided and tested.

Concerns

  1. Nested Retry Logic: VssStore has built-in retry logic. Wrapping it in TierStore creates nested retries.
  2. Backup Queue Capacity: Currently hard-coded (100 in prod, 5 in tests). Ideally this should be configurable but there are concerns about exposing this to callers? What is a sensible default?
  3. Data Consistency Between Primary and Backup: Backup operates asynchronously and may lag behind or miss operations if the queue overflows. Should we consider adding a background sync task to reconcile backup with primary? Expose metrics for backup lag/queue depth? Implement catch-up logic on startup?
  4. Exposing KVStore to the FFI

Related Issues

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Nov 1, 2025

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot ldk-reviews-bot requested a review from tnull November 1, 2025 23:18
@ldk-reviews-bot
Copy link
Copy Markdown

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 4 times, most recently from 29f47f3 to 264aa7f Compare November 4, 2025 22:07
@ldk-reviews-bot
Copy link
Copy Markdown

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 3rd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 4th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 5th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 6th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 7th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 8th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 9th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 10th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 11th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 12th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 13th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 264aa7f to 493dd9a Compare December 2, 2025 19:50
@ldk-reviews-bot
Copy link
Copy Markdown

🔔 14th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 2 times, most recently from a30cbfb to 1e7bdbc Compare December 4, 2025 23:30
@ldk-reviews-bot
Copy link
Copy Markdown

🔔 15th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 16th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Copy Markdown
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and excuse the delay here!

I did a first pass, generally this looks already pretty good, but it is a huge PR and in some areas could be simplified. For instance, we should drop the generic Retry logic as the concrete implementations would already implement that if they need it. Likewise, we shouldn't fallback to the backup for now as it's really only meant as disaster recovery (KVStore caching is out-of-scope for this PR now, even though we might want to explore that soon, too). There is also no need to replicate the write-ordering locks in TierStore, but we'll need them in ForeignKVStoreAdapter (which might be mergeable with DynStore?).

Generally, if you find opportunities to reduce the size of the changeset here it would be appreciated. It would also be cool if you could try to break up the PR into more feature commits, but feel free to leave as is if not.

Comment thread src/io/tier_store.rs Outdated

/// Configuration for exponential backoff retry behavior.
#[derive(Debug, Copy, Clone)]
pub struct RetryConfig {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying is pretty specific to the particular KVStore implementation. I don't think we should expose retrying params on top of what we already do for implementations internally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. This has been dropped.
I considered it because users can choose their own KVStore implementation without considering retrying. I thought to add some level of control but the added complexity and size of the changeset might not be worth it.

Comment thread src/io/tier_store.rs Outdated
}

pub struct TierStoreInner {
/// For remote data.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might also be local.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and addressed here db1fe83

Comment thread bindings/ldk_node.udl Outdated
[Throws=BuildError]
Node build_with_vss_store_and_header_provider(NodeEntropy node_entropy, string vss_url, string store_id, VssHeaderProvider header_provider);
[Throws=BuildError]
Node build_with_tier_store(NodeEntropy node_entropy, DynStore primary_store);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we now also expose Builder::build_with_store?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can.
This is exposed here 451a392

Comment thread src/ffi/types.rs Outdated
Self { inner: Arc::new(adapter) }
}

pub fn from_ldk_store(store: Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>) -> Arc<Self> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rather make this an impl From<Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>> for Arc .., we can drop the wrap_storemacro and just use.into()` instead.

Comment thread src/builder.rs Outdated
}

let store = wrap_store!(Arc::new(tier_store));
self.build_with_store(node_entropy, store)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use build_with_store_internal here to make sure we're using the same Runtime etc.

Comment thread src/io/tier_store.rs Outdated
Arc::clone(&outer_lock.entry(locking_key).or_default())
}

async fn execute_locked_write<
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why are we replicating all the write-ordering logic here? Given the implementations are required to fullfill LDK's requirements already, can't we just call the inner write and be done with it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the reasoning here is that users might not implement the KVStore trait for their own stores as required and thus, the onus of correctness will fall to us. If we can't get it from their inner stores, the wrapper TierStore should provide some guarantees. I do agree that if the inner store already satisfies LDK's requirements, the wrapper shouldn't need to duplicate that logic. The write-ordering has been removed.

Comment thread src/ffi/types.rs Outdated
) -> Pin<Box<dyn Future<Output = Result<(), lightning::io::Error>> + Send>> {
let inner = self.inner.clone();

let primary_namespace = primary_namespace.to_string();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here (and in remove) we need to add the write-ordering logic to ensure we follow LDK's KVStore::write requirements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been added 1ba4831

Comment thread src/types.rs Outdated
/// A type alias for [`SyncAndAsyncKVStore`] with `Sync`/`Send` markers;
pub type DynStore = dyn SyncAndAsyncKVStore + Sync + Send;
#[cfg(feature = "uniffi")]
pub(crate) use crate::DynStore;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Why do we need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially used the same name and the feature gating became necessary to differentiate the types at call sites when uniffi was enabled.

Comment thread src/io/test_utils.rs Outdated
}
}

pub struct DelayedStore {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what coverage we gain with DelayedStore? IMO, we might be better off dropping all this boilerplate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed here cb66b59.
I couldn't think of any way better to test backup queue overflow not impacting primary writes. I agree that the boilerplate seems too much for just that single case.

Comment thread src/ffi/types.rs Outdated
) -> Result<Vec<String>, IOError>;
}

pub struct ForeignKVStoreAdapter {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, any reason this needs to be separate from DynStore? Couldn't we merge them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I believe I introduced the extra wrapper due to the limitation of foreign trait implementation on foreign types, i.e. I couldn't implement KVStore for Arc<dyn LdkSyncAndAsyncKVStore> but with the new changes, this is no longer relevant and has been removed.

Comment thread src/ffi/types.rs Outdated
}
}

#[async_trait]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.

Well, it's 'the official'/supported way to do async traits with Uniffi: https://mozilla.github.io/uniffi-rs/latest/futures.html#exporting-async-trait-methods

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does uniffi require that in some way? Its just a comically-overkill way to sugar impl Future which is kinda nuts...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, worse, it looks like async_trait desugars to the Pin<Box<dyn ...>> version...which is dumb but at least for uniffi it shouldn't matter cause we'd have to do that anyway.

Copy link
Copy Markdown
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this will need a substantial rebase now that #696 landed, sorry for that!

Besides some tedious rebase work, we now have DynStoreWrapper which can be used on the ffi and the non-ffi side. I do wonder if we can actually just merge that wrapper with the TierStore, as well as the ForeignKVStoreAdapter/DynStore. Seems all these wrapper structs may not be needed and we could just get away with one trait and one wrapper struct, mostly?

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 7 times, most recently from 19a6c09 to 7720935 Compare April 8, 2026 07:12
@enigbe
Copy link
Copy Markdown
Contributor Author

enigbe commented Apr 8, 2026

Hi @tnull @TheBlueMatt,

Quick update on the rework here.

Following the review feedback, I’ve now split this into a 2-PR stack so the core storage changes can be reviewed separately from the FFI surface and binding tests:

  • #692: the Rust-side TierStore implementation and native NodeBuilder integration
  • #871: FFI exposure plus Rust/Python integration coverage on top

As part of that rework, I also incorporated a number of design changes from the earlier version of this PR:

  • The retry configuration / nested retry handling has been dropped from the core design for now
  • The old best-effort queued backup path has been removed. Backup persistence is now part of the foreground success path for primary-backed writes/removals: when a backup store is configured, the primary and backup operations are issued in parallel and success is only reported once both complete. This gives a stronger and clearer durability guarantee for the primary+backup case that motivated the feature, while also simplifying the implementation compared to the earlier best-effort async queueing approach
  • KVStoreSync now delegates to the inner stores’ sync implementations rather than sharing TierStore’s async/runtime path, so wrapped stores can preserve their own sync behavior
  • Write-ordering concerns are no longer duplicated in TierStore; where ordering matters across the FFI boundary, it is handled in the FFI-facing store wrapper instead
  • The FFI-specific surface area and end-to-end binding tests have been moved into the follow-up PR so this PR stays focused on the Rust-side storage implementation and integration

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 7720935 to 181db03 Compare April 9, 2026 07:24
@tnull tnull self-requested a review April 13, 2026 12:23
Comment thread src/io/tier_store.rs Outdated
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or
// http://opensource.org/licenses/MIT>, at your option. You may not use this file except in
// accordance with one or both of these licenses.
#![allow(dead_code)] // TODO: Temporal warning silencer. Will be removed in later commit.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce this in the first place then. Warnings are fine to have for a commit.

Comment thread src/io/tier_store.rs Outdated
use std::future::Future;
use std::sync::Arc;

/// A 3-tiered [`KVStoreSync`] implementation that routes data across storage
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KVStore, not KVStoreSync, right?

Comment thread src/io/tier_store.rs Outdated
/// - an optional ephemeral store for non-critical, rebuildable cached data.
///
/// When a backup store is configured, writes and removals for primary-backed data
/// are issued to the primary and backup stores concurrently and only succeed once
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this really what we want? If the backup store is remote and we fail to persist, should we really abort? Doesn't make this our operations less reliable in practice instead of more, if we always fail on the weakest link?

Also, if we simply abort if one of the writes fails, how could we recover without inconsistencies? We don't have a good way to roll-back the write that went through, right? So we'd still have inconsistencies, so wouldn't it be preferable to not fail the write in the first place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this really what we want? If the backup store is remote and we fail to persist, should we really abort? Doesn't make this our operations less reliable in practice instead of more, if we always fail on the weakest link?

The current implementation follows Matt's recommendation, where we assume a simplified configuration with primary-remote and backup-local. In this case, we should have high confidence in local backup rarely failing. However, if it does, I do agree that we don't want to propagate its error, especially if primary-remote succeeds. I could modify it so that we log backup-local failures and only ever fail if primary fails.

Also, if we simply abort if one of the writes fails, how could we recover without inconsistencies? We don't have a good way to roll-back the write that went through, right? So we'd still have inconsistencies, so wouldn't it be preferable to not fail the write in the first place?

I have thought about this and wonder if we should explore syncing by extending DynStoreTrait with list_all_keys and use LDK's existing migrate_kv_store_data to reconcile backup from primary at startup. This would require that primary and backup both impl MigratableKVStore. I'd like to know what you think about this approach.

Regarding primary-local, backup-remote, Matt suggested that I shelve the consideration for a later, possible follow up to this, if we can come up with a more robust solution than I initially implemented. Any pointers here will be deeply appreciated.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation follows Matt's recommendation, where we assume a simplified configuration with primary-remote and backup-local. In this case, we should have high confidence in local backup rarely failing. However, if it does, I do agree that we don't want to propagate its error, especially if primary-remote succeeds. I could modify it so that we log backup-local failures and only ever fail if primary fails.

Hum, maybe it might be best to discuss all options we have offline @TheBlueMatt ? Especially if we also want to support remote backup (which is reasonable/important, IMO), failing if either operation fails seems not great? However, if we only want to support a local mirror for now, maybe we should just simplify everything, and make enabling the backup store a binary option that enables an SQLite-based backup store? As a side effect, the user could easily copy the database to recover?

I have thought about this and wonder if we should explore syncing by extending DynStoreTrait with list_all_keys and use LDK's existing migrate_kv_store_data to reconcile backup from primary at startup. This would require that primary and backup both impl MigratableKVStore. I'd like to know what you think about this approach.

Yeah, likely using MigratableKVStore is the way to go, but maybe we should just leave this as a manual step rather than doing auto-recovery? We do already expose all necessary utilities for it, no?

Regarding primary-local, backup-remote, Matt suggested that I shelve the consideration for a later, possible follow up to this, if we can come up with a more robust solution than I initially implemented. Any pointers here will be deeply appreciated.

Yeah, okay, if we really don't want to do remote backups for now, we should just keep a mirrored SQLite store at a configurable location, IMO, and not allow the user to configure anything else. I do however want to note that people might still try to misuse this, e.g., by pointing the storage location of the mirror to a remote file system (ask me how I know: I did the same thing for a while to keep my CLN node at the time backed up).

Copy link
Copy Markdown
Contributor

@TheBlueMatt TheBlueMatt Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO if we want to move away from failing if either backend fails we need to have proper recovery logic in place. If we have a remote backup and writing to it fails, it otherwise wouldn't be obvious that restoring from the "live backup" is suddenly unsafe. Even making the write complete early suddenly makes using the "live backup" unsafe after a shutdown if we were mid-write.

It's great in theory to provide an optimistic live backup, but absent a way to actually use it safely in really not convinced it's a great idea to ship it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so IMO the short term solution is to hardcode backups to local SQLite, and treat write failures the same as persistence failures to the primary store.

Ofc, that won't safe us entirely from the two stores getting out of sync as we'd still won't have rollbacks in place in case one write succeeds and the other doesn't. Question is if that means that we can at least parallelize instead of sequencing the writes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine, though, as long as async writes aren't completed until both complete. IOW, it should always be the case that we can restart in either state when there's a pending uncompleted write when we stop. I don't see any reason to not parallelize the writes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we didn't strictly have any kind of guarantees around it being allowed for write A to be incomplete, then on restart missing, then later completed (in case we switched the "primary store" after a later restart). At least in the lightning crate I think that's fine - for monitors it'd only matter in cases where another update happened, but that would imply overwriting the previous dangling write to A. Similar is obviously true for nearly everything else, where we're constantly overwriting the latest state.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should document this on KVStore so that its a formal part of the API.

Comment thread src/io/tier_store.rs Outdated
/// Reads and lists do not consult the backup store during normal operation.
/// Ephemeral data is read from and written to the ephemeral store when configured.
///
/// Note that dual-store writes and removals are not atomic across the primary and
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, as mention above, I noted that: but what can I do now if a write failed? How can the user actually take action to avoid this or recover from it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the proposed change (only fail if primary fails), the user never needs to act on backup failures, we can just log them and reconcile at startup. Primary failure handling remains as is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have spent some time rethinking this and considered the option of allowing users to configure how they'd like to backup data, i.e. either best effort or semi-synchronously. In the latter case, in addition to the synchronous backup store, I have added an asynchronous retry queue backed by a local FilesystemStoreV2 where we try to durably persist our backup intent.

We offer guarantees in semi-sync mode based on the durable persistence of primary write and backup intent, surfacing a write failure to the caller only if the backup intent cannot be persisted to the local retry queue, even though the primary write may have already succeeded. I believe this is better than the alternative where we silently lose backup coverage without the caller knowing.

This does create a scenario where the backup store can drift behind the primary: if the node crashes after the primary write succeeds but before backup intent to retry queue is persisted, the backup will be missing those operations. The backup will be internally consistent but stale, and drift between the stores will form as the retry queue will be unaware of backup intent(s) that didn't persist. I'd like to understand whether this guarantee is sufficient for the restore use case, or whether we'd also need active reconciliation between the stores.

This approach was inspired by how DDIA describes semi-synchronous replication in single-leader replication systems. I also think it addresses some of the concerns @TheBlueMatt raised about primary-local and backup-remote configurations.

Would be really grateful to get both your thoughts on the approach taken.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking through this, however, it would be great if we could move any more advanced scheme like a backup queue to a follow-up PR as that might need some more review time and we should aim to land this PR (and the bindings follow-up) soon. So, as mentioned above, how about we just make the backup store an SQLite store with configurable location, requiring both writes to continue. Then, in a follow up we can explore the queuing concept.

For one that gives us more flexibility with regards to when this follow-up lands, but it would also allow us to further think through, e.g., how to deal with the overlap with a next-gen VssStore, which should also start using a local read cache, and how all of this would interact with the generally keeping only a LRU page cache of the (payment) stores in memory and reading the entries from the stores on demand (which we also want to finally do soon).

Comment thread src/builder.rs Outdated
/// primary and backup stores complete successfully.
///
/// If not set, durable data will be stored only in the primary store.
#[allow(dead_code)] // Used by subsequent FFI/test integration commits.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this allowing dead_code?

Copy link
Copy Markdown
Contributor Author

@enigbe enigbe Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to silence warnings as I didn't think warnings were permissible in intermediate commits. Since you've clarified that they are, I can remove them from intermediate commits. However, the annotations on the final commit are still needed (the code isn't used except in #871), so CI will fail without them.

Comment thread src/builder.rs
self
}

/// Configures the backup store for local disaster recovery.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we set the backup store, but how do we envision the restore to work? Should that be part of the recovery_mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two approaches we can take here. For the first, have both primary and backup concrete stores implement MigratableKVStore and have the user call migrate_kv_store_data before building. It's simple but not ideal as it's not part of any existing node or builder APIs, and would require explicit documentation.

Alternatively, we can add a restore_from_backup(backup) method on NodeBuilder and have the migration/restoration of data from backup (source) to primary (target) happen inside build. This requires adding list_all_keys to DynStoreTrait, and MigratableKVStore for both stores so the migration can work through the type-erased Arc<DynStore> layer.

For the second approach, we could also refactor recovery_mode from a bool into a struct:

pub struct RecoveryMode {
    pub wallet: bool, // resync on-chain wallet from genesis
    pub backup: bool, // restore persisted data from backup store
}

This keeps both recovery concerns under one concept while remaining independent. A user restoring from backup may not need a full wallet resync, and vice versa.

Copy link
Copy Markdown
Contributor Author

@enigbe enigbe Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the restore side, I went with the second approach from my earlier comment restore_from_backup() on the builder, with restoration happening inside build before normal startup reads. I refactored recovery_mode from a bool into a RecoveryConfig struct that keeps wallet recovery and backup restore independent. The restore itself enumerates the backup store via list_all_keys, filters to a known durable key inventory, checks that the primary store is empty, and copies only matching entries. This keeps restore policy explicit rather than blindly migrating everything from the backup store.

@tnull tnull added this to the 0.8 milestone Apr 20, 2026
enigbe added 5 commits April 27, 2026 07:17
This commit:

Adds `TierStore`, a tiered `KVStore`/`KVStoreSync` implementation that
routes node persistence across three storage roles:

- a primary store for durable, authoritative data
- an optional backup store for a second durable copy of primary-backed data
- an optional ephemeral store for rebuildable cached data such as the
  network graph and scorer

TierStore routes ephemeral cache data to the ephemeral store when
configured, while durable data remains primary+backup. Reads and lists
do not consult the backup store during normal operation.

For primary+backup writes and removals, this implementation treats the
backup store as part of the persistence success path rather than as a
best-effort background mirror. Earlier designs used asynchronous backup
queueing to avoid blocking the primary path, but that weakens the
durability contract by allowing primary success to be reported before
backup persistence has completed. TierStore now issues primary and backup
operations together and only returns success once both complete.

This gives callers a clearer persistence guarantee when a backup store is
configured: acknowledged primary+backup mutations have been attempted
against both durable stores. The tradeoff is that dual-store operations
are not atomic across stores, so an error may still be returned after one
store has already been updated.

TierStore also implements `KVStoreSync` in terms of dedicated synchronous
helpers that call the wrapped stores' sync interfaces directly. This
preserves the inner stores' synchronous semantics instead of routing sync
operations through a previously held async runtime.

Additionally, adds unit coverage for the current contract, including:
- basic read/write/remove/list persistence
- routing of ephemeral data away from the primary store
- backup participation in the foreground success path for writes and removals
Add native builder support for tiered storage by introducing
`TierStoreConfig`, backup and ephemeral store configuration methods,
and a `build_with_dynstore` path that wraps the provided store as the
primary tier and applies any configured secondary tiers.

This makes tiered storage a builder concern while keeping the Rust
`build_with_store` API ergonomic for native callers.

Note: The temporary `dead_code` allowance will be removed in a
follow-up commit once the new tier-store builder APIs are exercised.
improve TierStore documentation
silently log backup failure
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 3 times, most recently from f59ec3c to bc5c1fb Compare April 27, 2026 07:39
enigbe added 3 commits April 27, 2026 09:09
This is an exploratory commit to:

- Introduce BackupMode with BestEffortBackup and SemiSync semantics,
  plus a locally persisted backup retry queue for failed backup writes
  and removals.
- Add serialization for pending backup ops, deterministic
  queue encoding, durable enqueue semantics for SemiSync, and a retry
  task skeleton with backoff and stale-op detection.

Also update TierStore and queue docs to reflect the new backup-mode
model and at-least-once cleanup semantics.
Replace TierStoreInner's raw backup DynStore with BackupStore, which
holds the backup store plus an optional BackupRetryQueue.

Update write/remove backup result handling to accept PendingBackupOp
and enqueue failed backup operations when a retry queue is present;
otherwise only log the backup failure.

This makes the configured backup semantics explicit: best-effort mode
logs backup failures, while semisync mode requires durably recording
failed backup intents for later retry.

Adjust set_backup_store, builder wiring, and docs for the new backup
configuration shape.
Thread BackupMode through TierStoreConfig and update backup-store
configuration to distinguish best-effort backup writes from semisync
behavior.

Build a local retry queue store for SemiSync during tier-store
construction, retain the concrete TierStore on Node, and spawn the
background backup retry task during Node::start() with shutdown
integration via stop_sender.

Also update TierStore backup result handling to enqueue concrete
PendingBackupOp values for durable retry, and refresh the related
backup and retry-task documentation.
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from bc5c1fb to 5da76a8 Compare April 27, 2026 08:09
enigbe added 4 commits April 27, 2026 09:51
Refactor tier_store unit tests around shared filesystem-backed test
fixtures and add failure-injection helpers for backup and retry stores.

Cover best-effort backup failure handling, semisync retry enqueueing,
queue-persist failure behavior, retry-task replay of queued writes and
removes, remove-not-found idempotence, retry-queue restart reload,
dedup behavior across replacement sequences, stale snapshot skipping,
and pending-op serialization round-trips.
Extend the native store plumbing to support key enumeration for store
migration by wiring `MigratableKVStore` through `DynStoreWrapper` and
implementing it for LDK-native backends, including `SqliteStore`,
`VssStore`, and tiered storage, plus the in-repo test stores needed to
keep the migration path exercised.

This lays the plumbing for backup restoration and reconciliation by
making migration a first-class capability of the stores ldk-node
natively owns and configures, ensuring they can exhaustively enumerate
their persisted keys.
Introduce a restore path that copies durable state from a configured
backup store into an empty primary store before normal node
initialization.

This commit:
- Adds a recovery module to define the durable restore scope, filter known
  durable keys, restore them into the primary store, and detect when a
  primary already contains durable state.
- Wires this into the builder by separating backup restore from wallet recovery,
  adding restore-specific build errors, and running restore before any normal
  startup reads.

Also, covers the new logic with unit tests plus an integration test that exercises
backup population, restore into a fresh primary, and successful node boot with
preserved identity.
Delete unused enqueue_async method on the backup retry queue
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 5da76a8 to dec4aa7 Compare April 27, 2026 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants